{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Protein Folding\n",
    "\n",
    "## Objective and Prerequisites\n",
    "\n",
    "Hone your modeling skills with this challenging Protein Folding problem. We’ll show you how to create a binary optimization model of the problem with the Gurobi Python API and then solve it using the Gurobi Optimizer.\n",
    "\n",
    "This model is example 28 from the fifth edition of Model Building in Mathematical Programming by H. Paul Williams on pages 289-290 and 344-345.\n",
    "\n",
    "This modeling example is at the advanced level, where we assume that you know Python and the Gurobi Python API and that you have advanced knowledge of building mathematical optimization models. Typically, the objective function and/or constraints of these examples are complex or require advanced features of the Gurobi Python API.\n",
    "\n",
    "**Download the Repository** <br /> \n",
    "You can download the repository containing this and other examples by clicking [here](https://github.com/Gurobi/modeling-examples/archive/master.zip). \n",
    "\n",
    "**Gurobi License** <br /> \n",
    "In order to run this Jupyter Notebook properly, you must have a Gurobi license. If you do not have one, you can request an [evaluation license](https://www.gurobi.com/downloads/request-an-evaluation-license/) as a *commercial user*, or download a [free license](https://www.gurobi.com/academia/academic-program-and-licenses/) as an *academic user*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Problem Description\n",
    "\n",
    "The problem described in this Jupyter Notebook is based on a molecular biology problem that is discussed in a paper entitled “Quadratic Binary Programming Models in Computational Biology” by Forrester and Greenberg (2008). The problem pertains to a protein, which consists of a chain of amino acids. In this problem, the amino acids come in two forms: hydrophilic (waterloving) and hydrophobic (water hating). An example of such a chain is given in the following figure with the hydrophobic acids marked in bold.\n",
    "\n",
    "![chain](chain.PNG)\n",
    "\n",
    "Such a chain naturally folds so as to bring as many hydrophobic acids as possible close together. A folding for the chain, in two dimensions, is given in the following figure, with the new matches marked by dashed lines. Our objective is to predict the optimum folding.\n",
    "\n",
    "![folding](folding.PNG)\n",
    "\n",
    "To solve the problem posed here, we must find the optimum folding for a chain of 50 amino acids with hydrophobic acids at positions 2, 4, 5, 6, 11, 12, 17, 20, 21, 25, 27, 28, 30, 31, 33, 37, 44 and 46."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Model Formulation\n",
    "\n",
    "### Sets and Indices\n",
    "\n",
    "$k \\in A =\\{1,2,...,50\\}$: Chain of aminoacids.\n",
    "\n",
    "$i,j \\in H =\\{2,4,5,6,11,12,17,20,21,25,27,28,30,31,33,37,44,46\\} \\subseteq Aminoacids $: Subset of aminoacids that are hydrophobic.\n",
    "\n",
    "### Decision variables\n",
    "\n",
    "$\\text{match}_{i,j} \\equiv x_{i,j} = 1$, iff hydrophobic acid $i$ is matched with acid $j$, for all hydrophobic acids $ i < j \\in H$. This matching does not include those matchings that are predefined by virtue of the acids being contiguous in the chain, i.e. $j > i+1$).\n",
    "\n",
    "$\\text{fold}_{k} \\equiv y_{k} = 1$, iff a fold occurs between the $i$th and $(i +1)$st acids in the chain.\n",
    "\n",
    "### Constraints\n",
    "\n",
    "For each pair of hydrophobic acids $i$ and $j$, we can match them if: \n",
    "* they are not contiguous, that is, already matched, \n",
    "* they have an even number of acids between them in the chain,\n",
    "* and there is exactly one fold between $i$ and $j$ . \n",
    "\n",
    "This gives rise to the following constraints.\n",
    "\n",
    "1. $y_{k} + x_{i,j} \\leq 1, \\; \\forall k \\in A, (i,j) \\in H, \\; \\text{such that} \\; i \\leq k < j, \\; \\text{and} \\; \n",
    "k \\neq (i+j-1)/2$.\n",
    "2. $x_{i,j} \\leq y_{k}, \\; \\text{where} \\; k = (i+j-1)/2 $.\n",
    "\n",
    "Let $\\text{H_fold} = \\{(i,j) \\in H: x_{i,j} \\leq y_{k}, \\; k = (i+j-1)/2  \\}$ be the set of hydrophobic acids for which there is a folding that enables the matching.\n",
    "\n",
    "### Objective function\n",
    "The objective is to maximize the number of matchings of hydrophobic acids.\n",
    "\n",
    "$$\n",
    "\\sum_{i,j \\in \\text{H_fold}} x_{i,j}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Python Implementation\n",
    "\n",
    "We import the Gurobi Python Module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install gurobipy \n",
    "# Note, the restricted license is not sufficient to run this notebook, a full license is needed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gurobipy as gp\n",
    "from gurobipy import GRB\n",
    "\n",
    "# tested with Python 3.11 & Gurobi 11.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Input Data "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# list of aminoacids and hydrophobic acids\n",
    "\n",
    "acids = [*range(1,51)]\n",
    "\n",
    "h_phobic = [2,4,5,6,11,12,17,20,21,25,27,28,30,31,33,37,44,46]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating the data structures to generate the model\n",
    "list_ij = []\n",
    "\n",
    "# Indices of hydrophobic acids that can be matched\n",
    "for i in h_phobic:\n",
    "    for j in h_phobic:\n",
    "        if j > i + 1:\n",
    "            tp = i,j\n",
    "            list_ij.append(tp)\n",
    "            \n",
    "ij = gp.tuplelist(list_ij)\n",
    "\n",
    "###\n",
    "list_ik1j = []\n",
    "\n",
    "list_ik2j = []\n",
    "\n",
    "for i,j in ij:\n",
    "    for k in range(i,j):\n",
    "        if (k == (i+j-1)/2  ):\n",
    "            tp = i,j,k\n",
    "            list_ik2j.append(tp)\n",
    "        else:\n",
    "            tp = i,j,k\n",
    "            list_ik1j.append(tp)\n",
    "\n",
    "# Indices for constraints of type 2\n",
    "ik2j = gp.tuplelist(list_ik2j)\n",
    "\n",
    "# Indices for constraints of type 1\n",
    "ik1j = gp.tuplelist(list_ik1j)\n",
    "\n",
    "# Matchings that are enabled by a folding\n",
    "list_ijfold = []\n",
    "\n",
    "for i,j,k in ik2j:\n",
    "    tp = i,j\n",
    "    list_ijfold.append(tp)\n",
    "\n",
    "ijfold = gp.tuplelist(list_ijfold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Deployment\n",
    "We create a model and the decision variables. There are two types of decision variables: the variables that determine which hydrophobic acids to match, and the variables that determine at which amino acid the folding of the protein happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using license file c:\\gurobi\\gurobi.lic\n"
     ]
    }
   ],
   "source": [
    "model = gp.Model('ProteinFolding')\n",
    "\n",
    "# Matching variables\n",
    "match = model.addVars(ij, vtype=GRB.BINARY, name=\"match\")\n",
    "\n",
    "# Folding variables\n",
    "fold = model.addVars(acids, vtype=GRB.BINARY, name=\"fold\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Folding and matching constraints\n",
    "\n",
    "1. $y_{k} + x_{i,j} \\leq 1, \\; \\forall k \\in A, (i,j) \\in H, \\; \\text{such that} \\; i \\leq k < j, \\; \\text{and} \\; \n",
    "k \\neq (i+j-1)/2$.\n",
    "2. $x_{i,j} \\leq y_{k}, \\; \\text{where} \\; k = (i+j-1)/2 $.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Constraint 1:\n",
    "\n",
    "C1 = model.addConstrs( (fold[k] + match[i,j] <= 1 for i,j,k in ik1j ) , name='C1')\n",
    "\n",
    "# Constraint 2:\n",
    "\n",
    "C2 = model.addConstrs( ( match[i,j] <= fold[k]  for i,j,k in ik2j ) , name='C2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Objective function\n",
    "\n",
    "Maximize the matchings of hydrophobic acids. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Objective function\n",
    "\n",
    "model.setObjective(gp.quicksum(match[i,j] for i,j in ijfold) , GRB.MAXIMIZE )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Gurobi Optimizer version 9.1.0 build v9.1.0rc0 (win64)\n",
      "Thread count: 4 physical cores, 8 logical processors, using up to 8 threads\n",
      "Optimize a model with 2441 rows, 197 columns and 4882 nonzeros\n",
      "Model fingerprint: 0x7a3a8e69\n",
      "Variable types: 0 continuous, 197 integer (197 binary)\n",
      "Coefficient statistics:\n",
      "  Matrix range     [1e+00, 1e+00]\n",
      "  Objective range  [1e+00, 1e+00]\n",
      "  Bounds range     [1e+00, 1e+00]\n",
      "  RHS range        [1e+00, 1e+00]\n",
      "Found heuristic solution: objective -0.0000000\n",
      "Presolve removed 1596 rows and 105 columns\n",
      "Presolve time: 0.02s\n",
      "Presolved: 845 rows, 92 columns, 1779 nonzeros\n",
      "Variable types: 0 continuous, 92 integer (92 binary)\n",
      "\n",
      "Root relaxation: objective 3.200000e+01, 90 iterations, 0.00 seconds\n",
      "\n",
      "    Nodes    |    Current Node    |     Objective Bounds      |     Work\n",
      " Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time\n",
      "\n",
      "     0     0   32.00000    0   89   -0.00000   32.00000      -     -    0s\n",
      "H    0     0                       8.0000000   32.00000   300%     -    0s\n",
      "H    0     0                       9.0000000   32.00000   256%     -    0s\n",
      "     0     0   21.00000    0   69    9.00000   21.00000   133%     -    0s\n",
      "     0     0   13.50000    0   81    9.00000   13.50000  50.0%     -    0s\n",
      "     0     0   12.50000    0   82    9.00000   12.50000  38.9%     -    0s\n",
      "     0     0   12.50000    0   82    9.00000   12.50000  38.9%     -    0s\n",
      "     0     0   12.00000    0   82    9.00000   12.00000  33.3%     -    0s\n",
      "     0     0   12.00000    0   82    9.00000   12.00000  33.3%     -    0s\n",
      "H    0     0                      10.0000000   11.00000  10.0%     -    0s\n",
      "     0     0   11.00000    0   76   10.00000   11.00000  10.0%     -    0s\n",
      "     0     0   11.00000    0   67   10.00000   11.00000  10.0%     -    0s\n",
      "     0     2   11.00000    0   67   10.00000   11.00000  10.0%     -    0s\n",
      "\n",
      "Cutting planes:\n",
      "  Gomory: 2\n",
      "  Zero half: 17\n",
      "  RLT: 38\n",
      "\n",
      "Explored 55 nodes (1222 simplex iterations) in 0.21 seconds\n",
      "Thread count was 8 (of 8 available processors)\n",
      "\n",
      "Solution count 4: 10 9 8 -0 \n",
      "\n",
      "Optimal solution found (tolerance 1.00e-04)\n",
      "Best objective 1.000000000000e+01, best bound 1.000000000000e+01, gap 0.0000%\n"
     ]
    }
   ],
   "source": [
    "# Verify model formulation\n",
    "\n",
    "model.write('ProteinFolding.lp')\n",
    "\n",
    "# Run optimization engine\n",
    "\n",
    "model.optimize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimal number of hydrophobic acids matchings: 10.0\n",
      "_______________________________________\n",
      "Optimal matching of hydrophobic acids.\n",
      "_______________________________________\n",
      "Hydrophobic acid matching (2, 5) with folding at amonacid 3.\n",
      "Hydrophobic acid matching (5, 12) with folding at amonacid 8.\n",
      "Hydrophobic acid matching (6, 11) with folding at amonacid 8.\n",
      "Hydrophobic acid matching (12, 17) with folding at amonacid 14.\n",
      "Hydrophobic acid matching (17, 20) with folding at amonacid 18.\n",
      "Hydrophobic acid matching (20, 25) with folding at amonacid 22.\n",
      "Hydrophobic acid matching (25, 28) with folding at amonacid 26.\n",
      "Hydrophobic acid matching (28, 31) with folding at amonacid 29.\n",
      "Hydrophobic acid matching (31, 46) with folding at amonacid 38.\n",
      "Hydrophobic acid matching (33, 44) with folding at amonacid 38.\n"
     ]
    }
   ],
   "source": [
    "# Output report\n",
    "\n",
    "print(f\"Optimal number of hydrophobic acids matchings: {model.objVal}\")\n",
    "\n",
    "print(\"_______________________________________\")\n",
    "print(f\"Optimal matching of hydrophobic acids.\")\n",
    "print(\"_______________________________________\")\n",
    "\n",
    "for i,j,k in ik2j:\n",
    "    if (match[i,j].x > 0.5): \n",
    "        print(f\"Hydrophobic acid matching {i,j} with folding at amonacid {k}.\")\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## References\n",
    "\n",
    "H. Paul Williams, Model Building in Mathematical Programming, fifth edition.\n",
    "\n",
    "Forrester, R.J. and Greenberg, H.J. (2008) Quadratic Binary Programming Models in Computational Biology. Algorithmic Operations Research, 3, 110–129.\n",
    "\n",
    "Copyright © 2020 Gurobi Optimization, LLC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
